Sentence Selection by Direct Likelihood Maximization for Language Model Adaptation
نویسندگان
چکیده
A general framework of language model task adaptation is to select documents in a large training set based on a language model estimated on a development data. However, this strategy has a deficiency that the selected documents are biased to the most frequent patterns in the development data. To address this problem, a new task adaptation method is proposed that selects documents in the training set so as to directly reduce the perplexity on the development set. Moreover, a weighting method to modify the perplexity objective function is proposed to improve the generalization to unseen data. The proposed adaptation methods are evaluated by large vocabulary speech recognition experiments. It is shown that the proposed adaptation with the weighting term produces a compact-size model that gives consistently lower word error rates for different tasks.
منابع مشابه
Joint and Coupled Bilingual Topic Model Based Sentence Representations for Language Model Adaptation
This paper is concerned with data selection for adapting language model (LM) in statistical machine translation (SMT), and aims to find the LM training sentences that are topic similar to the translation task. Although the traditional approaches have gained significant performance, they ignore the topic information and the distribution information of words when selecting similar training senten...
متن کاملDiscounted likelihood linear regression for rapid speaker adaptation
The widely used maximum likelihood linear regression speaker adaptation procedure suffers from overtraining when used for rapid adaptation tasks in which the amount of adaptation data is severely limited. This is a well known difficulty associated with the expectation maximization algorithm. We use an information geometric analysis of the expectation maximization algorithm as an alternating min...
متن کاملAssessing the Translation of Parvin Etesami's Selected Poems Using Vinay and Darbelnet’s Model
Translators always seek to find the best equivalents for each word, sentence or phrase in the target language (TL) in order to have the most accurate and meaningful translation of the text. Generally, a translator’s main concern is whether to prefer the form over the content or vice versa. In translation studies, literal translation prioritizes the form while free translation concentrates on th...
متن کاملCombining a mixture language model and Naive Bayes for multi-document summarisation
The TNO system for multi-document summarisation is based on an extraction approach. We combined two statistical methods for sentence selection with a variant of the MMR algorithm. After sentence segmentation, each sentence is scored on the basis of two probabilistic models. The first model scores sentences based on a (generative) unigram language model, which is a mixture of a cluster model, a ...
متن کاملA Supplier Selection Model for Social Responsible Supply Chain
Due to the importance of supplier selection issue in supply chain management (SCM) and ,also, the increasing tendency of organizations to their social responsibilities, In this paper, we survey the supplier selection issue as a multi objective problem while considering the factor of corporate social responsibility (CSR) as a mathematical parameter. The purpose of this paper is to design a mode...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011